- Can make the parameter vector the output of a function.
- eg. weight sharing with a Y connector (tie values of W together)
- weights are forced to be equal; basis of a lot of ideas
- gradients are summed up while backpropagating
- Hypernetwork: weights of one network are computed as the outputs of another network. (will come back in a few weeks)
- Detect motif anywhere on an input
- detect if there's a speech signal
- can have a detector that can slide over the input – that all share the same weight
- output goes to a max function
- similarly can swipe a template over the image to detect motifs
- Shift invariance – output unchanged with shift in input
- Shift equivariance – output changes corresponding to shift in input
- Convolution yi = sum over j wj * x(i-j)
- Index goes backwards in the window, goes forward in the weights
- 2d y(ij) = sum over kl wkl x(i+k,j+l)
- Cross correlation
- Index and weights go forward together
- w is the convolution kernel
- stride move window forward by > 1 step
- generally solved by padding the windows
- Inspired by biology
- brain recognizes objects in 100ms
- can have very specialized: invariant to irrelevant transformations
- simple cells detect local features
- complex cells pool outputs of simple cells
- complex cell relies on a combination of all the subcells
- Architecture
- Filter banks / non linearity / pooling
- Modern arch
- Normalization
- Filter bank
- Non linearity
- Feature pooling (generally max pooling)
- max, pth root of sum of pth powers; probability pooling
- basically functions that return the same value irrespective of position
- (Repeat above)
- Classifier
- Fully connected layers
- can be viewed as 1 by 1 convolutions
- LATER Read LeCun 1998 and implement it with Pytorch; implement it with my own code
- github.com/activatedgeek/LeNet-5
- Multiple character recognition
- Swipe convnet over the input: shifting it over the input
- Allows giving characters
- That becomes extremely wasteful
- Instead
- Take a large input and keep convolving
- Get multiple outputs with a convolutional layer
- much more cheaper than recomputing at every location
- Another approach was finding the character at the middle of the convnet
- Convnets are good for
- shift invariant, distortion invariant
- sizes of the objects change a lot
- be able to detect smaller objects by training on constantly smaller sizes on the same convnet
- multi dim array signals
- strong local correlations between values
- less similar with distance
- features can appear anywhere is why we can have shared weights
- FC net doesn't care about permutations
- Practicum
- signals can be represented as vectors – waveform
- words are one hot vectors – language has those kinds of properties
- Receptive field = how many neurons I see with the previous layer
- sparsity only because data shows locality
- stationary – things appear again and again => can share parameters
- parameter sharing leads to
- faster convergence
- better generalization
- not constrained on the input size (can keep shifting)
- kernel independence => high parallelization
- kernels => 1d data
- 1d data uses 3 kernels
- odd sized kernels so that it's evenly distributed on both sides
- Standard spatial cnn
- multiple layers of
- conv
- non linearities
- pooling
- batch normalization
- Residual bypass connection
- logit at the end for classification
- Geoff hinton – capsule maps